منابع مشابه
Audio-Visual Identity Grounding for Enabling Cross Media Search
Automatically searching for media clips in large heterogeneous datasets is an inherently difficult challenge, and nearly impossibly so when searching across distinct media types (e.g. finding audio clips that match an image). In this paper we introduce the exploitation of identity grounding for enabling this cross media search and exploration capability. Through the use of grounding we leverage...
متن کاملIllinois Cross-Lingual Wikifier: Grounding Entities in Many Languages to the English Wikipedia
We release a cross-lingual wikification system for all languages in Wikipedia. Given a piece of text in any supported language, the system identifies names of people, locations, organizations, and grounds these names to the corresponding English Wikipedia entries. The system is based on two components: a cross-lingual named entity recognition (NER) model and a crosslingual mention grounding mod...
متن کاملIntra-Lingual and Cross-Lingual Prosody Modelling
Statistical Parametric Speech Synthesis (SPSS) offers flexibility and computational advantage compared to other methods for Text-to-Speech Synthesis. While the speech output is intelligible, statistically trained voices are less natural due to the amount of signal processing and statistical averaging that goes into building the models. Much of the blame for the lack of naturalness falls on the ...
متن کاملActive Grounding of Visual Situations
We address a key problem for computer vision: retrieving images that are instances of visual situations. Visual situations are concepts such as “a boxing match”, “a birthday party”, “walking the dog”, “a crowd waiting for a bus,” “a handshake”, or “a game of ping-pong,” whose instantiations in images are linked more by their common spatial and semantic structure than by low-level visual similar...
متن کاملGrounding Visual Explanations (Extended Abstract)
Existing models [2] which generate textual explanations enforce task relevance through a discriminative term loss function, but such mechanisms only weakly constrain mentioned object parts to actually be present in the image. In this paper, a new model is proposed for generating explanations by utilizing localized grounding of constituent phrases in generated explanations to ensure image releva...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2021
ISSN: 2169-3536
DOI: 10.1109/access.2020.3046719